Plasmodium falciparum serology: A comparison of two protein production methods for analysis of antibody responses by protein microarray

The evaluation of protein antigens as putative serologic biomarkers of infection has increasingly shifted to high-throughput, multiplex approaches such as the protein microarray. In vitro transcription/translation (IVTT) systems–a similarly high-throughput protein expression method–are already widely utilised in the production of protein microarrays, though purified recombinant proteins derived from more traditional whole cell based expression systems also play an important role in biomarker characterisation. Here we have performed a side-by-side comparison of antigen-matched protein targets from an IVTT and purified recombinant system, on the same protein microarray. The magnitude and range of antibody responses to purified recombinants was found to be greater than that of IVTT proteins, and responses between targets from different expression systems did not clearly correlate. However, responses between amino acid sequence-matched targets from each expression system were more closely correlated. Despite the lack of a clear correlation between antigen-matched targets produced in each expression system, our data indicate that protein microarrays produced using either method can be used confidently, in a context dependent manner, though care should be taken when comparing data derived from contrasting approaches.


Introduction
To date, the majority of malaria serologic studies have focussed on antibody responses to a small number of well-characterised, highly immunogenic Plasmodium falciparum antigens that have proven to be reliable markers of exposure to infection [1][2][3][4][5][6][7][8] expresses more than 5000 proteins, each a potential antibody target [9,10]. Advances in technology have led to the development of new assay platforms that allow proteome scale investigation of antibody responses, such as the protein microarray [11, 12]-boasting significantly greater experimental throughput than more classical monoplex methods (e.g. ELISA) [13,14]. The ability to simultaneously interrogate large numbers of putative targets, using low volumes of sample, significantly increases the rate at which an individual's antibody responses to antigens can be characterised. As such, protein microarray based approaches to biomarker identification and humoral response profiling in malaria, and other infectious diseases, have been increasingly adopted [15][16][17][18][19][20][21][22][23][24].
One widely utilised form of the protein microarray is based on an in vitro transcription/ translation (IVTT) system [25]-where protein products are produced through a PCR, in vivo recombination cloning and an in vitro expression pipeline, before being printed onto arrays [15]. In principle, whole organism proteome microarrays can be fabricated simply and quickly, enabling analysis of all potential protein driven immune responses to a pathogen. Cell-free synthesis (CFS) is a technique first established over 50 years ago as a means to dissect the molecular mechanisms around protein expression. More recently, the technique has been used as a high throughput expression platform to explore a number of diverse biological processes [26,27]. At its simplest, the approach utilises the crude extract containing the transcription and translation machinery from the cell, performing the process of protein expression without the constraints of the cell. This allows a wide variety of proteins to be expressed including those that would be deemed toxic if expression was attempted within the confines of the cell membrane [28]. CFS systems based on Escherichia coli (E.coli) are among the most widely used of the IVTT systems [27] and have helped to transform the narrative around a number of areas including biomarker discovery for infectious diseases [15,29,30]. Despite the widespread uptake of the approach there remain some issues around the technique. This includes significant heterogeneity of expression, leading some research groups to describe the mechanisms of the process as a "black box". Therefore, the inherent heterogeneity between products is not assessed for every target making it difficult to normalise for reactivity between protein spots, which represent an impure mix of E. coli and target protein. In addition to the E. coli cell-free expression platform, other approaches have been employed in the characterisation of protein targets for immunological assessment. The wheat germ cell-free expression system in particular has also proven to be an important platform in the advancement of biomarker discovery and malaria vaccine research [31][32][33][34]. This is not the focus of the current study.
In contrast to the IVTT array methodology, the printing of purified proteins is cheaper and typically more quantifiable. Uniform amounts of product can therefore be incorporated into arrays, increasing confidence when comparing quantitative antibody responses between antigenic targets [35] and assessing relative immunogenicity. The process can be modified to support the scale up of recombinant proteins, and furthermore, affinity purification of protein targets reduces the risk of undesired background reactivity due to expression system components, and in part truncated proteins. However, the time required to produce panels of purified proteins is far in excess of the IVTT system, particularly for large numbers of targets, unless supported by an automated production platform [36][37][38]. For both the IVTT and purified protein E. coli systems, although the production of complex conformational proteins is possible it can sometimes be a challenge [39,40]. These challenges are in part due to the expression of proteins foreign to the bacteria, the speed at which bacteria express proteins, only partially mitigated with a reduction in expression temperature; and the lack of essential molecular chaperones to aid correct folding/refolding of proteins [41][42][43].
Here we present a comparison between IVTT based and purified proteins on a single microarray. For clarity proteins produced using the IVTT system will simply be referred to as IVTT proteins, and those produced by conventional E.coli expression will be referred to as purified proteins. Matched malarial protein targets from each methodology were assessed for comparative reactivity in serum from Ugandan participant samples (n = 899) [44] to determine the suitability of each approach in the context of high-throughput profiling of serological responses to protein antigens.

Ethics statement
All serum samples were collected after written informed consent from the participant or their parent/guardian. The protocol for sample collection was reviewed and approved by the Makerere University School of Medicine Research and Ethics Committee (#2011-149 and #2011-167), the London School of Hygiene and Tropical Medicine Ethics Committee (#5943 and #5944), the Durham University School of Biological and Biomedical Sciences Ethics Committee, the University of California, San Francisco, Committee on Human Research (#11-05539 and #11-05995) and the Uganda National Council for Science and Technology (#HS-978 and #HS-1019).

Samples
Sera were originally collected as part of a comprehensive longitudinal surveillance study conducted in three sub-counties in Uganda (Walukuba, Jinja District; Kihihi, Kanungu District, and Nagongera, Tororo). The study design and methods have been previously reported and are described in detail elsewhere [44]. A sub-selection of samples (n = 899) was made from individuals across a breadth of recorded clinical episodes of malaria to ensure a range of seroreactivity.

Protein targets
Purified protein expression. Recombinant proteins were generated and expressed in Escherichia coli as glutathione S-transferase (GST)-tagged fusion proteins using previously described methods: PfMSP1- 19 [52]. The exception to this was PfAMA1, which was expressed as a histidine tagged protein in Pichia pastoris [53]. Purification of the expressed proteins was performed using affinity chromatography (Glutathione Sepharose 4B (GE Healthcare Life Sciences) or HisPur Ni-NTA (Invitrogen) resins for GST and His tagged proteins, respectively). Protein concentration was assessed using the Bradford protein assay, with quality, and purity assessed by resolution on a 4-20% gradient SDS-PAGE.
IVTT protein expression. An IVTT system was used to express proteins of interest as previously described [15]. Briefly, Plasmodium falciparum DNA (3D7 isolate) coding sequences were PCR-amplified and cloned into T7 expression vectors via homologous recombination. Target sequences were expressed at 21˚C for 16h in E. coli-based, cell-free transcription/translation reactions, and products were printed onto arrays as un-purified, whole reaction mixtures.
Overview of compared IVTT and purified protein antigens. We assessed antibody responses to protein targets mapping to eleven antigens (i.e. distinct gene products), each represented on the array by at least one IVTT and one purified protein target. Full details are in Table 1 and S1 Table. The number of purified protein targets varied according to availability, while the number of IVTT targets was dependent on the exon composition of each the gene sequence; multiple exon sequences were expressed as multiple protein targets based on exon delineation. Similarly, single exon gene sequences were generally expressed as a single protein.
As a result, of the 11 antigens investigated, 8 were represented by >1 IVTT or purified protein target; 5 had >1 IVTT protein target (EBA181, HSP40, MSP1, MSP4 and MSP5) and 5 had >1 purified protein target (ACS5, ETRAMP4, ETRAMP5, HSP40 and MSP1). Near identical IVTT proteins (1 terminal amino acid difference in length) were produced independently and printed in parallel for two antigens: MSP4 and MSP5 as expression controls. Sequence information used in the design and expression of the purified E.coli proteins were generally smaller than the equivalent proteins expressed in the IVTT cell-free systems. This was done to limit the sequence length to below 1kb as expression of proteins larger that 1kb in E.coli can contribute to poor or failed expression yields [42,43]. Truncation of target sequences was based on in

PLOS ONE
silico mapping of each protein sequence to focus on regions of predicted immunogenicity based on the in silico analysis. Empty GST vectors were expressed and the purified GST used in background correction for proteins with this tag. His-tag vector was not expressed as it has proven impossible to express and purify the 6xhistidine tag in isolation.

Protein microarray
Prior to printing, Tween 20 was added to purified proteins to yield a final concentration of 0.001% Tween 20. Arrays were printed onto nitrocellulose-coated slides (AVID, Grace Bio-Labs, Inc., Bend, OR, USA) using an Omni Grid Accent microarray printer (Digilabs, Inc., Marlborough, MA, USA). Alongside proteins of interest, buffer (PBS) and no-DNA (empty T7 vector reactions) were included as controls to allow for background normalisation of purified and IVTT proteins respectively. Sample probing. For analysis of antibody reactivity on the protein microarray, serum samples were diluted 1:200 in a 3 mg mL -1 E. coli lysate solution in protein arraying buffer (Maine Manufacturing, Sanford, ME, USA) and incubated at room temperature for 30 min. Arrays were rehydrated in blocking buffer for 30 min. Blocking buffer was removed, and arrays were probed with pre-incubated serum samples using sealed, fitted slide chambers to ensure no cross-contamination of sample between pads. Chips were incubated overnight at 4˚C with agitation. Arrays were washed five times with TBS-0.05% Tween 20, followed by incubation with biotin-conjugated goat anti-human IgG (Jackson ImmunoResearch, West Grove, PA, USA) diluted 1:200 in blocking buffer at room temperature. Arrays were washed three times with TBS-0.05% Tween 20, followed by incubation with streptavidin-conjugated SureLight P-3 (Columbia Biosciences, Frederick, MD, USA) at room temperature protected from light. Arrays were washed three times with TBS-0.05% Tween 20, three times with TBS, and once with water. Arrays were air dried by centrifugation at 500 x g for 5 min and scanned on a GenePix 4300A High-Resolution Microarray Scanner (Molecular Devices, Sunnyvale, CA, USA). Target and background intensities were measured using an annotated grid file (. GAL).
Data normalisation. Microarray spot foreground and local background fluorescence data were imported into R (Foundation for Statistical Computing, Vienna, Austria) for correction, normalisation and analysis. Local background intensities were subtracted from foreground using the backgroundCorrect function of the limma package [54]. The backgroundCorrect function was then further applied to GST-tagged purified proteins, whereby background-corrected GST fluorescence was subtracted from background-corrected target fluorescence to account for any GST-specific reactivity in samples. All data were then Log2 transformed and the mean signal intensity of buffer and no-DNA control spots were subtracted from purified and IVTT proteins respectively to give a relative measure of reactivity to targets over background (S1 Fig) [20]. Table 1 summarises the purified and IVTT protein targets for each antigen, with further detail in S1 Table. In brief, we assessed IgG antibody responses to 35 antigenic targets, derived from 11 well-characterised P. falciparum protein antigens (distinct gene products). Each antigen was represented by at least one IVTT and one purified protein target.

Magnitude of responses between expression systems
The magnitude of response to all protein targets was compared by antigen to evaluate differences in seroreactivity between IVTT derived and purified protein targets. As expected, responses varied significantly between antigens and between the protein targets mapping to each antigen.
Mean responses to all targets were compared by expression system (Fig 1) revealing a greater range of response to purified proteins (IQR Log2MFI = 3.88-6.40) than IVTT proteins (IQR Log2MFI 0.46-1.68), and a greater magnitude of response to purified than IVTT targets (p = <0.001). Similarly, the range and median intensity of individual antibody responses was found to be greater for purified proteins than their IVTT counterparts (e.  16-8.52]) for all targets (p = <0.001) except MSP1 Pure_2, which more closely reflected the level of reactivity to the two MSP1 IVTT targets (Fig 2).

Correlation of responses between antigen matched targets
Considering all at least partially sequence matched IVTT and purified protein targets (i.e. excluding pairwise comparisons where purified protein sequence were completely non-overlapping with IVTT sequence for the same antigen) there was no evidence for a general correlation in mean response between expression platforms (Spearman's rho (r s ) = 0.279, p = 0.23). Antibody responses to all protein targets for each antigen were therefore compared

PLOS ONE
overlap by 17 amino acids-equivalent to a small peptide in terminal regions unlikely to cover immunogenic epitopes. As such, these targets were considered non-overlapping. For MSP4 and MSP5, duplicate IVTT protein products were generated for each gene, with each duplicate protein identical to the other except for the omission of one [N-or C-] terminal amino acid. These respective targets resulted in near perfect correlation of antibody responses (MSP4 r s = 1.00, p = <0.001; MSP5 r s = 0.94, p = <0.001). Multiple purified protein targets were produced for ACS5, ETRAMP4, ETRAMP5, HSP40 and MSP1-none of which overlap. Correlation between these purified protein targets in each antigen varied between 0.31 and 0.59 (S2 Fig).
For the 8 antigens with >1 IVTT or purified protein target, the greatest level of correlation was found between an IVTT and purified target in 4/8 instances; between two IVTT targets (IVTT-IVTT) in 3/8 instances; and between two purified targets (purified-purified) in 1/8 instances (S2 Table). Comparing correlations between antigen-matched IVTT and purified proteins only, overlapping targets correlate more highly than non-overlapping targets. Sample sizes were too low to test the significance of this trend within antigens (Fig 4).

Discussion
Protein microarrays are a practical approach to the serological screening of large numbers of putative malaria antigen biomarkers. The throughput and flexibility of the microarray platform presents an opportunity to interrogate malarial antibody responses at a scale far exceeding traditional mono-or multiplex approaches, agnostic of predicted immunological targets. Here we have evaluated matched antigenic targets produced using two E. coli-based expression techniques-in vitro transcription/translation (IVTT), and purified, whole-cell recombinantsin the context of a protein microarray. We found that the magnitude of antibody responses to purified protein targets was generally higher than for their IVTT counterparts, and that correlation between protein target pairs at the individual serum sample level was variable and related to degree of sequence homogeneity between targets. Our findings warn against direct comparisons of microarray data from proteins produced in different expression platforms without careful cross-validation of sequences and allelic types. However, our data do provide support for the use of both IVTT and purified protein microarray platforms in the context of early-stage antigen biomarker identification to feed into experimental pipelines where candidate proteins may be interrogated by methods providing higher resolution analysis.
In building this study, we predicted that the magnitude of responses to IVTT productswhich tended to be longer, often representing single exon sequences and therefore potentially containing more epitopes-would be greater than purified targets truncated based on speciesspecificity or domain boundaries which potentially represented fewer epitopes. Contrary to this prediction, we found that purified proteins captured a greater range and magnitude of responses (Purified, IQR Log2MFI = 3.88-6.40; IVTT, IQR Log2MFI 0.46-1.68; p = <0.001). The greater level of reactivity to purified targets may relate to differences in the amount of protein deposited on the array, where consistent and defined amounts of purified protein are spotted in contrast to the unquantified, and likely variable IVTT products. These findings recommend a degree of caution in interpretation of array data from two different platforms, for example: MSP5 showed the second highest mean MFI for any purified protein, but showed among the lowest mean MFI of any IVTT protein.
In addition to differences in the magnitude of mean responses to targets stratified by expression system, we observed a greater range of individual sample responses, stratified by antigen, to purified proteins than in sequence matched IVTT-expressed targets (e.g. protein [53,55] and this observation is likely a reflection of antibody reactivity to correctly folded (P. pastoris) and incorrectly folded AMA1 (IVTT). We acknowledge that a lack of correct folding in other purified and IVTT products may impact on epitope recognition by antibodies raised to native protein during infection. However, human antibody responses are composed of a polyclonal response to each antigen, which will include both confirmation and linear epitopes. Whilst questions remain about the appropriateness of using unfolded protein fragments in serological screens, such reagents remain the most widely utilised and efficient approach in this context at present. Considering all antigenic targets together, we found no evidence of correlation in mean reactivity to sequence matched targets between expression systems (r s = 0.28, p = 0.23). In the context of this study, this was not unexpected taking into account the differences observed in magnitude of response between IVTT and purified proteins, and that the length of native protein sequence coverage between IVTT and purified targets was highly variable. More broadly, it is perhaps less reassuring that matched targets derived from different expression systems lack more obvious relationships in antibody response than have been demonstrated in other studies [17,56], though Kobayashi et al. report relatively similar results for a smaller number of targets expressed in E. coli (purified proteins) and IVTT systems specifically [30]. It is likely that protein concentration disparities between the two approaches are one of the drivers of this heterogeneity. However, without attempting to quantify the exact amount of protein generated in the small volume of IVTT reactions we are unable to address this here. Although in this current study targets grouped by antigen displayed highly variable correlations of response, it is encouraging that sequence matched proteins did generally display stronger correlations of response than non-sequence matched targets. Further, this may indicate the importance of capturing specific epitopes within expression sequences when producing antigens by either expression method.
The IVTT system lends itself to microarray applications, as vast numbers of proteins, or even entire proteomes, may be produced at scale relatively quickly. However, for application to serology there is concern that expressed proteins are not quantified before printing, and that expression levels of product may vary considerably; product yield in bacterial-based IVTT systems is generally considered to be lower (typically~1 mg mL -1 or less) though higher protein yields have been reported [71,72]. This has been shown to be due to an inherent heterogeneity with IVTT components, although this weakness is an area of active research [26,73]. Similarly, it is important to acknowledge that the un-purified nature of printed reaction mixtures may mask, or otherwise adversely affect, the detection of antibody reactivity in a sample; Davies et al. report IVTT reaction compositions of 99% E. coli lysate to 1% target protein [15], though this will vary considerably, at scale, in practice.
In contrast to IVTT-based microarrays, printing purified protein allows a highly quantifiable approach to be taken. Affinity purification and dialysis of expression products substantially reduces the risk of background reactivity to bacterial components, and the simple determination of target protein concentrations allows defined quantities of product to be spotted, providing much greater confidence when comparing reactivity between targets. However, these advantages come at a substantial cost; the need for in silico analysis to design vectors, transfection procedures, expression and purification drastically slows the rate at which putative targets can be produced and screened. Shorter, epitope specific sequences may in theory be transposed from IVTT systems with a view to generating more granular serological screens, though we accept that truncated protein targets will in some cases favour linear B cell epitopes, while missing conformational epitopes. However, for measuring exposure to infection there is less importance on the targeting of confirmation epitopes than would be required for protective epitopes [57].
The primary benefit of the microarray platform is the ability to screen orders of magnitude more targets simultaneously than more standard serological assays. Our analysis shows that both IVTT and purified proteins can be successfully used to capture malarial protein-antigen specific antibody responses on a protein microarray. Although correlations of response between expression systems are not as strong as may have been expected, a number of acknowledged technical differences in the methods of protein production may account for this finding. In addition to the E. coli in vivo and IVTT systems utilised here, high-throughput wheat germ cell free systems have been successfully used to conduct large scale serological screens of putative antigen biomarkers [74,75], alongside chemically synthesised peptide arrays [57,62]. High-throughput mammalian and baculovirus expression systems have also been pioneered for the production of recombinant proteins [36,76]. Differences in expression efficiency and the homology to native epitopes achieved by the assortment of available approaches likely have considerable impact on the capture of antibody from sample. This variability should be accounted for both in terms of choosing an experimental approach and comparative analysis between different methods. We suggest that further investigation of differences in seroreactivity to sequence-matched proteins derived from contrasting expression systems is needed to shed light on the parity between such data that is already widely published. It should also be noted that it is unlikely that any single expression platform will satisfy the demands of all recombinant expression projects due to varying importance such as protein folding, proteins activity (e.g. enzymes) and glycosylation. In addition, E. coli expression has the advantage of low cost, flexibility and easy scale-up.
Considering the data presented here more broadly, observed trends lend support to the utilisation of both IVTT and purified arrays depending on the objectives and context of hypotheses to be investigated. The strengths and weaknesses of each expression system should dictate the chosen approach on a case-by-case basis. For example, very high-density proteome level screening to identify 'shortlists' of candidate markers based on binary categorisation of seropositivity may be best achieved using IVTT systems. In contrast, smaller numbers of 'shortlisted' targets expressed as purified proteins may allow for more nuanced characterisation of antibody responses on a more continuous scale. As already described, the key limitation in the production of purified recombinants in our current expression pipeline is throughput. The adaption of our methods to increase the capacity of protein production would improve our ability to more widely mine the biomarker information derived from the IVTT platform. As such, we are currently exploring a number of existing approaches to address this methodological bottleneck [38,77].
In summary, the IVTT protein microarray approach has proven to be a powerful, highthroughput, biomarker discovery platform with applicability across a range of infectious diseases. When combined with a cheap, scalable and flexible protein expression platform such as the E. coli in vivo expression platform we have the ability to mine potential diagnostic and vaccine related targets.
Supporting information S1 Fig. Data normalisation processes for IVTT and purified protein spots. After local background correction using the backgroundCorrect function from the limma package, purified protein spots were additionally corrected for possible GST reactivity by subtracting GST reactivity using the same function. After Log2 transformation, IVTT and purified proteins were normalised to background control spots of empty T7 vector and PBS buffer control spots respectively. (PDF)